feat(endpoints): Add OpenAI Responses API endpoint with fixes and integration tests by acere · Pull Request #43 · awslabs/llmeter

acere · 2026-03-25T01:49:52Z

Summary

Adds the OpenAI Responses API endpoint support for LLMeter, with fixes to align with the actual API behavior.

Changes

Endpoint fixes (`llmeter/endpoints/openai_response.py`)

Rename max_tokens to max_output_tokens in create_payload (Response API parameter name)
Fix _parse_response to handle usage=None (Bedrock Mantle doesn't always return it) and use input_tokens/output_tokens with fallback to prompt_tokens/completion_tokens
Rewrite _parse_stream_response to process typed events (response.output_text.delta, response.completed) instead of the old chunk-with-output-array format

Integration tests

Add tests/integ/test_response_endpoint.py — integration tests for ResponseEndpoint and ResponseStreamEndpoint wrappers against Bedrock Mantle
Fix tests/integ/test_response_bedrock.py to use ResponseUsage attribute names (input_tokens/output_tokens)

Unit test updates

Update all unit test mocks across 5 test files to use spec-based usage mocks (input_tokens/output_tokens) and event-based streaming mocks

Example notebook

Add examples/LLMeter with OpenAI Response API on Bedrock.ipynb demonstrating non-streaming and streaming usage with Runner and plotting

Testing

All 527 unit tests pass
Ruff lint clean

… test suite - Add ResponseEndpoint and ResponseStreamEndpoint classes for OpenAI Responses API support - Implement non-streaming and streaming response handling with proper error management - Add structured output support with response format validation and serialization - Create comprehensive unit test suite covering response parsing, error handling, format validation, model parameters, payload parsing, properties, and serialization - Add integration tests for Bedrock response endpoint functionality - Export new response endpoint classes from endpoints module - Update integration test configuration with response endpoint fixtures

- Rename max_tokens to max_output_tokens in create_payload (Response API parameter name) - Fix _parse_response to handle usage=None (Bedrock Mantle) and use input_tokens/output_tokens with fallback to prompt_tokens/completion_tokens - Rewrite _parse_stream_response to process typed events (response.output_text.delta, response.completed) instead of the old chunk-with-output-array format - Fix test_response_bedrock.py to use ResponseUsage attribute names (input_tokens/output_tokens) - Add integration tests for ResponseEndpoint and ResponseStreamEndpoint - Add example notebook for Response API on Bedrock - Update all unit test mocks to match new behavior

athewsey · 2026-04-06T09:37:50Z

llmeter/endpoints/__init__.py

+        ResponseEndpoint,
+        ResponseStreamEndpoint,


Shouldn't these be OpenAIResponseEndpoint and OpenAIResponseStreamEndpoint for consistency with the existing ChatCompletion ones? "ResponseEndpoint" seems very generic.

I agree. updated

athewsey · 2026-04-06T09:40:39Z

llmeter/endpoints/openai_response.py

+        try:
+            client_response = self._client.responses.create(**payload)
+        except APIConnectionError as e:
+            logger.error(e)


In bedrock_invoke and litellm we're using logger.exception(e), which also prints the stack trace... I'd suggest we standardize on one or the other when handling endpoint invocation errors into InvocationResponse.error_outputs?

go logger.exception(e)!

athewsey · 2026-04-06T09:49:36Z

llmeter/endpoints/openai_response.py

+        max_output_tokens: int = 256,
+        instructions: str | None = None,
+        **kwargs,
+    ) -> Dict:


Unlike boto3, the OpenAI Python SDK has pretty solid (and TypedDict-based) typings already... Should we even be creating this convenience method in LLMeter? Or just typing payload as ResponseCreateParams for this endpoint and encouraging users to build it via the OpenAI SDK directly?

(Same logic would apply to the existing ChatCompletions endpoint too)

updated all openAI endpoint classes to leverage the SDK typing. that simplified some of the parsing gymnastics. I'm not in favor of sun-setting create_payload. it's not a hard requirement to create payloads using this method, but it offers an easy consistent way to create tests across providers.

athewsey · 2026-04-06T09:52:32Z

tests/integ/test_response_bedrock.py

+    # Configure OpenAI client with Bedrock Mantle endpoint for Response API
+    # Response API uses bedrock-mantle endpoint, not bedrock-runtime
+    base_url = f"https://bedrock-mantle.{aws_region}.api.aws/v1"
+    client = OpenAI(api_key=token, base_url=base_url)


Looks like this is just testing the OpenAI SDK and not the LLMeter endpoint??

Same for the streaming test below too

Indeed... fixed now.

athewsey · 2026-04-06T10:02:21Z

Also almost forgot - we should add the relevant module placeholder .md under docs api reference

- Replace Poetry with uv in GitHub Actions PyPI workflow for faster builds - Update .gitignore to track uv.lock instead of poetry.lock - Migrate pyproject.toml from Poetry format to standard PEP 621 format - Update CONTRIBUTING.md with uv installation and development instructions - Update README.md with uv installation examples for both basic and extras - Simplify dependency management and build configuration - Improve CI/CD performance and developer experience with uv tooling

…poetry in the documentation. update test documentation.

- Upgrade astral-sh/setup-uv action from v4 to v7 - Update Python version requirement from <3.13 to <4 in pyproject.toml - Add reference to tests/README.md in CONTRIBUTING.md for testing documentation - Align with uv package manager migration and improve version flexibility

Use importlib.metadata to read the version from installed package metadata, with a fallback to "0.0.0" when the package is not formally installed. This fixes `AttributeError: module 'llmeter' has no attribute '__version__'`.

Use the __name__ variable to retrieve LLMeter's version from importlib, rather than hard-coding the module's name.

Update test payloads and JMESPath expressions in test_bedrock_invoke.py to match Amazon Nova's native Invoke API format, since the default BEDROCK_TEST_MODEL was changed from Claude to Nova in PR awslabs#36. - Non-streaming: use output.message.content[0].text, usage.outputTokens - Streaming: use contentBlockDelta.delta.text, metadata.usage.*Tokens - Request payload: use schemaVersion messages-v1 and inferenceConfig Fixes awslabs#38

onnxruntime 1.24.3 dropped Python 3.10 support, causing the release workflow to fail. Bump the build environment to Python 3.12.

uv build only needs the build backend (hatchling), which it resolves on its own. Installing all dev/test dependencies is unnecessary and was pulling in onnxruntime which lacks Python 3.10 wheels.

Still lots of gaps to fill in

…nd fix build warnings - Add metrics and statistics page with LLM latency concepts (TTFT, TTLT, TPOT), percentile reliability guidance, run-level stats, cost metrics, and visualization examples - Add API reference pages for callbacks (base, cost, mlflow) and bedrock_invoke endpoint - Update installation page with uv instructions, mlflow extra, OpenAI-compatible API description - Fix broken relative links in index.md and key_concepts.md - Add type annotations to fix all griffe warnings in mkdocs build - Fix docstring issues (parameter name mismatch, indentation) in base.py and runner.py - Pin mkdocs<2 to avoid incompatible upstream changes - Add callbacks card to homepage

Move overall homepage within the User Guide instead of a confusing separate tab. Add an API Reference home page.

We don't have github discussions enabled anyway

Add headers to module pages so they don't appear as 'index'. Add some clarifying text to API reference home page. Add some missing pages and fix associated griffe type warnings. Improve some docstrings.

As discussed at https://fpgmaas.com/blog/collapse-of-mkdocs/, MkDocs has been unmaintained for some time and the new v2 will not support Material for MkDocs that we used to use for theming. Migrate to Zensical, a project by the authors of Material for MkDocs team that aims to offer easy compatibility. Also, update the docs GitHub workflow to reflect our moves Poetry->UV and MkDocs->Zensical.

Include section in contributing file to guide devs on how to preview and maintain the documentation website.

Remove custom analytics placeholder page. Fill out 'run experiments' placeholder page. Move unnecessarily folder-nested user guide pages up to the root (URL won't change if we folder them again in future when we have more content).

Add push trigger for main branch with path filters on docs/** and mkdocs.yml so documentation updates are deployed without waiting for a release.

…dencies The docs build only needs mkdocstrings and zensical. Using --only-group instead of --group skips the main project dependencies (torch, mlflow, nvidia packages, etc.) that are not needed for static doc generation.

The deploy-pages action requires id-token: write to obtain the ACTIONS_ID_TOKEN_REQUEST_URL needed for authentication.

- Include .github/workflows/docs.yml in path filter so workflow changes also trigger a docs build - Add id-token: write permission required by deploy-pages action - Use --only-group docs to skip unnecessary main dependencies

- Add environment declaration required by deploy-pages action - Use --frozen on uv sync and --no-sync on uv run to prevent re-installing the full project dependencies during build

- Configure mkdocstrings Python handler with Google-style docstring parsing, source links, cross-references, and merged __init__ docs - Add missing prompt_utils API reference page and nav entry - Fix table column width issues causing awkward word splits in code tokens by keeping inline code on one line and setting min-width on description columns - Update CONTRIBUTING.md with lightweight docs build instructions using uv sync --only-group docs

Going back to sorting attributes alphabetically in the API doc for easier searching.

uv version without --no-sync modifies pyproject.toml and triggers an automatic sync, resolving and installing all 280+ dependencies unnecessarily in the publish workflow.

- Rename ResponseEndpoint -> OpenAIResponseEndpoint and ResponseStreamEndpoint -> OpenAIResponseStreamEndpoint for consistency with OpenAICompletionEndpoint naming convention - Change logger.error() to logger.exception() for stack trace consistency with bedrock_invoke.py and litellm.py - Rewrite test_response_bedrock.py to test LLMeter endpoint wrappers instead of raw OpenAI SDK - Update serialization test assertions for new class names - Update example notebook references

- Add docs/reference/endpoints/openai_response.md placeholder - Add openai_response to mkdocs.yml nav under endpoints - Update connect_endpoints user guide to mention Response API endpoints

- Type invoke() payload as CompletionCreateParams / ResponseCreateParams - Type create_payload() return as SDK TypedDicts using cast() - Replace jmespath with plain list comprehension in _parse_payload - Rewrite stream parsers using typed ChatCompletionChunk / event types, removing all hasattr/getattr fallbacks and type: ignore comments - Make OpenAIResponseStreamEndpoint inherit from OpenAIResponseEndpoint, deduplicating _parse_payload and create_payload - Use collections.abc.Sequence instead of typing.Sequence

acere added 2 commits March 24, 2026 21:41

acere requested a review from athewsey March 25, 2026 01:50

acere self-assigned this Mar 25, 2026

athewsey requested changes Apr 6, 2026

View reviewed changes

acere and others added 24 commits April 8, 2026 22:40

chore: update dependencies groups in pyproject, remove references to …

e7ef1af

…poetry in the documentation. update test documentation.

fix: expose __version__ attribute on llmeter package

2f78390

Use importlib.metadata to read the version from installed package metadata, with a fallback to "0.0.0" when the package is not formally installed. This fixes `AttributeError: module 'llmeter' has no attribute '__version__'`.

refactor: Fetch version from generic __name__

50951ee

Use the __name__ variable to retrieve LLMeter's version from importlib, rather than hard-coding the module's name.

fix: bump Python version in PyPI workflow to 3.12

1c33728

onnxruntime 1.24.3 dropped Python 3.10 support, causing the release workflow to fail. Bump the build environment to Python 3.12.

fix: remove unnecessary uv sync from PyPI publish workflow

f4d6939

uv build only needs the build backend (hatchling), which it resolves on its own. Installing all dev/test dependencies is unnecessary and was pulling in onnxruntime which lacks Python 3.10 wheels.

doc(site): Initial skeleton

1561c70

Still lots of gaps to fill in

doc: Homepage setup

271aefe

Move overall homepage within the User Guide instead of a confusing separate tab. Add an API Reference home page.

doc(site): Remove feedback banner

873b412

We don't have github discussions enabled anyway

doc: Improve API reference

a032f81

Add headers to module pages so they don't appear as 'index'. Add some clarifying text to API reference home page. Add some missing pages and fix associated griffe type warnings. Improve some docstrings.

doc(site): Link site from README

bc3db89

docs(site): Guidance in CONTRIBUTING.md

0709633

Include section in contributing file to guide devs on how to preview and maintain the documentation website.

doc(site): Remove or fill out placeholder pages

0a117ca

Remove custom analytics placeholder page. Fill out 'run experiments' placeholder page. Move unnecessarily folder-nested user guide pages up to the root (URL won't change if we folder them again in future when we have more content).

docs(site): Scope down GitHub CI permissions

0765c3e

ci(docs): also deploy docs on push to main when doc files change

381c38a

Add push trigger for main branch with path filters on docs/** and mkdocs.yml so documentation updates are deployed without waiting for a release.

ci(docs): add id-token write permission for Pages deployment

82e1d66

The deploy-pages action requires id-token: write to obtain the ACTIONS_ID_TOKEN_REQUEST_URL needed for authentication.

fix(ci): add github-pages environment and prevent uv run re-sync

307a5c1

- Add environment declaration required by deploy-pages action - Use --frozen on uv sync and --no-sync on uv run to prevent re-installing the full project dependencies during build

fix(ci): drop --frozen flag since uv.lock is not committed

5ea61f6

acere and others added 6 commits April 8, 2026 22:40

revert(doc): Sort API attr docs by source order

584808b

Going back to sorting attributes alphabetically in the API doc for easier searching.

fix: prevent uv version from triggering full dependency sync

a8dab20

uv version without --no-sync modifies pyproject.toml and triggers an automatic sync, resolving and installing all 280+ dependencies unnecessarily in the publish workflow.

docs: add API reference for OpenAI Response API endpoints

f3ab600

- Add docs/reference/endpoints/openai_response.md placeholder - Add openai_response to mkdocs.yml nav under endpoints - Update connect_endpoints user guide to mention Response API endpoints

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(endpoints): Add OpenAI Responses API endpoint with fixes and integration tests#43

feat(endpoints): Add OpenAI Responses API endpoint with fixes and integration tests#43
acere wants to merge 32 commits intoawslabs:mainfrom
acere:ResponseAPI

acere commented Mar 25, 2026

Uh oh!

athewsey Apr 6, 2026

Uh oh!

acere Apr 8, 2026

Uh oh!

athewsey Apr 6, 2026

Uh oh!

acere Apr 8, 2026

Uh oh!

athewsey Apr 6, 2026

Uh oh!

acere Apr 8, 2026

Uh oh!

athewsey Apr 6, 2026

Uh oh!

acere Apr 8, 2026

Uh oh!

athewsey commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

acere commented Mar 25, 2026

Summary

Changes

Endpoint fixes (llmeter/endpoints/openai_response.py)

Integration tests

Unit test updates

Example notebook

Testing

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

athewsey commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Endpoint fixes (`llmeter/endpoints/openai_response.py`)